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Abstract 

We present an algorithm based on maximum likelihood for the estimation and renor- 
malization (marginalization) of exponential densities. The moment-matching prob- 
lem resulting from the maximization of the likelihood is solved as an optimization 
problem using the Levenberg-Marquardt algorithm. In the case of renormalization, 
the moments needed to set up the moment-matching problem are evaluated using 
Swendsen's renormalization method. We focus on the renormalization version of 
the algorithm, where we demonstrate its use by computing the critical tempera- 
ture of the two-dimensional Ising model. Possible applications of the algorithm are 
discussed. 

Key words: Maximum likelihood, Monte Carlo, renormalization, exponential 
densities, Levenberg-Marquardt algorithm 



1 Introduction 



There are many problems in science and engineering that involve multiple 
time-scales or multiple spatial scales or both. However, due to computer limi- 
tations or due to the macroscopic nature of the quantities of interest, one wants 
to integrate out many of the variables involved in the system. This procedure 
involves two steps: i) one needs to specify the density of variables for the full 
system and ii) integrate out the unwanted variables (this amounts to the com- 
putation of the marginal probability density for the remaining variables). Both 
problems are of equal importance. One needs a good approximation for the 
density of all the variables involved in the system. But even if this is available. 
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the computation of marginal probabilities can be costly. The present paper 
addresses both problems, focusing more on the second. Even though the algo- 
rithm is, in principle, equally applicable to the first problem, we will present 
a detailed study of the behavior of the algorithm for that problem elsewhere. 
Interest in the evaluation of marginal probability densities is present in many 
different settings ranging from graphical models (the inference problem) [1] to 
reductions of systems of differential equations [2,3]. 

The algorithm is based on maximum hkelihood estimation. We should men- 
tion here the algorithm of Geyer and Thompson [4] for the estimation of the 
parameters of an unknown exponential density, which is also based on maxi- 
mum likelihood but whose approach is different from ours (see also [5]). To the 
best of our knowledge, the algorithm that we present here for renormalization 
is the first one to be based solely on maximum likehhood (see also [2] for a 
different approach based on conditional expectations). As will be explained 
more below, the numerical implementation of the algorithm requires the so- 
lution of an optimization problem. The solution of the optimization problem 
through the Levenberg-Marquardt algorithm is efficient and robust. It, also, 
avoids some problems associated with more conventional methods like steepest 
descent and Newton's. The advantage of the Levenberg-Marquardt algorithm 
becomes especially crucial in the case of renormalization, where one needs to 
determine very accurately the parameters of the renormalized (marginal) ex- 
ponential density. For the example we present (the 2D Ising model of spins), 
this allows the accurate determination of the critical temperature. 

Exponential densities are, partly due to their nice mathematical features, 
widely used in the modeling of densities of systems of interacting variables 
in different contexts, ranging from Hamiltonian systems to image processing 
and bioinformatics (see [6] and references therein). As a result, there is an 
increased interest in algorithms for estimating and manipulating such densi- 
ties numerically. What we offer here is a general algorithm that allows the 
estimation of parameters of exponential densities. In addition to estimating 
the parameters of an exponential density, and this is the main focus in the 
present work, it allows the renormalization of a known exponential density. 
Renormalization amounts to calculation of marginal densities in a way that 
the functional form of the density is retained. Since we want to retain the 
mathematical structure of the density, the marginal density that we compute 
will not always be exact. There is a trade-off between efficiency and accuracy 
in such calculations that is well known in the context of real-space renormaliza- 
tion in statistical physics [7,8]. For the test case presented here (the 2D Ising 
model of interacting spin variables), this does not turn out to be harmful. 

Suppose we are given a number of independent samples from a density and we 
try to fit an exponential density to these samples by maximizing their likeli- 
hood. Whether we are looking for the parameters of an unknown exponential 
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density or the rcnormalizcd parameters of a known exponential density, max- 
imum likelihood estimation of these parameters leads to a moment-matching 
problem. In other words, we want to determine the parameters of an expo- 
nential density so that a finite number of its moments match the moments 
computed from the given samples. In the present context, the word moment 
stands for the empirical average (expectation value) of a, not necessarily poly- 
nomial, function of the random variables. For the problem of estimating the 
parameters of an unknown exponential density, the moments needed to set 
up the moment-matching problem are computed by using the samples of the 
unknown density to which we are trying to fit the exponential density. For the 
problem of estimating the renormalized parameters of a known exponential 
density, the moments needed to set up the matching problem are computed 
using Swendsen's renormalization method [8]. In both cases, the equations 
that define the moment-matching problem contain, in general, nonlinear func- 
tions of the parameters to be estimated. We solve the matching problem as an 
optimization problem using the Levenberg-Marquardt algorithm (see e.g. [9]). 
The Levenberg-Marquardt algorithm is a combination of Newton's method 
and the method of steepest descent. The reason for using this algorithm is 
that it allows fiexibility in the initial guess of the parameters. Use of the 
steepest descent algorithm alone can lead to slow convergence, while use of 
Newton's method alone can lead to divergence, because most likely the initial 
guess of the parameters is not close to their true values. 

The paper is organized as follows. In Section 2 we present the algorithm for 
the estimation of the parameters of an unknown exponential density. We also 
present the modifications needed to renormalize a known exponential density. 
In Section 3 the algorithm is used, first to estimate, and then to renormalize the 
parameters of the two-dimensional Ising model of ±1 spins. The estimates of 
the renormalized parameters are used to locate the critical temperature of the 
model. In the final section, we discuss possible applications of the algorithm. 



2 The algorithm 



In this section we present an algorithm that allows the estimation of param- 
eters of exponential densities. We examine two cases. In the first case we are 
given a number of samples from a multivariate density and we estimate the pa- 
rameters for an exponential representation of the density. In the second case, 
we are given the parameters of a multivariate exponential density and we are 
presented with the problem of computing marginal probabilities so that the 
form of the density remains the same. 
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2. 1 Estimation of parameters for an unknown exponential density 

We begin our presentation with a few facts about families of exponential 
densities and convex analysis (see [10,11,6]). Let x = (xi, . . . , a;„) be an n- 
dimensional random vector taking values in C M". The set S"^ = Si x S2 x 
. . . S„, where G Si, . . . , a:„ G S„. Also, let ^fc(x). A; = 1, . . . , / be a collection 
of functions of x. The functions tj^k are known as potentials or sufficient statis- 
tics. Let -0 = [ipi-, • • • , ^0 be the vector of potential functions. Associated with 
the vector ■0 is a vector a — (cti, ...,«;) whose elements are called canonical 
or exponential parameters. The exponential family associated with is the 
collection of density functions (parametrized by a) of the form 



where (q;,-?/'(x)) = I]a;=i cifc'0fc(x) and Z{a) — /=„ exp(— (a, '0(x)))o?x. The 
exponential family is defined only for the set 



If A is open, the exponential family is called regular. We will restrict our 
attention to regular families, so that in all the theorems stated below the 
assumption of regularity will be implied. Usually the exponential density is 
defined without the minus sign, but we incorporate it in anticipation of the 
case of Hamiltonian systems like the Ising model that we examine in Section 
3. In that case we have i?(x) = («,-?/'(x)), where i?(x) is the Hamiltonian 
of the system and thus, the exponential density is the Boltzmann density. In 
the case where the variables Xi,i = 1, . . . ,n take only a discrete number of 
values, the integration is replaced by summation over the possible values for 
each of the Xj. If the functions ■0fc(x) are linearly independent, the represen- 
tation is called minimal. Otherwise, it is called overcomplete. The distinction 
between minimal and overcomplete representations will be used later when we 
formulate the moment-matching problem. 

Suppose that we are given a collection of N independent samples of an n- 
dimensional random vector x. In general, we do not know which density the 
samples are drawn from. There are many examples in practical applications 
where the random vector comes from a exponential density (see [6]). However, 
even if we do not know that the samples are drawn from an exponential density, 
we can try to fit an exponential density to the samples. This will become 
clearer when we formulate the moment-matching problem and exploit some 
properties of the exponential densities. The basic idea behind the algorithm we 
present here is to estimate the unknown parameter vector a by maximizing the 
likelihood function of the samples. For a collection of N independent samples 



p(x, a) 



exp(-(tt,-0(x))) 
Zia) 



(a) < 00}. 



4 



of the random vector x, the hkehhood function L is defined as (see e.g. [12]) 



N 

where p{'Xj.a) is the unknown exponential density whose parameters a we 
wish to determine. We associate a potential function ip^, k = 1, . . . ,1 with every 
parameter ak- Maximization of L with respect to the parameters ak produces 
an estimate a for a. Under suitable regularity conditions, the sequence of 
estimates a for increasing values of is asymptotically efficient and tends, 
with probability one, to a local maximum in parameter space. From now on 
we will use the notation a instead of a to denote the maximum likelihood 
estimate of the parameters keeping in mind that this is only an estimate of 
the parameters. In addition, we will be working with the logarithm of the 
likelihood logL, since it does not alter the position of the maximum and also 
leads to formulas that are more easily manipulated. Differentiation of logL 
with respect to the results in 

1 ^ 

Ea[M^)] = ^J2M^j): k=l,...J (1) 

where 

TP n f \^ /a"V'fc(x)exp(-(a,^(x)))o?x 
Jgn exp(— (a, -(/^(xj^jctx 
is the expectation value of the function ipk with respect to the density p(x, a). 
The right side of (1) is the average (moment) of the function ipk as calculated 
from the given samples. The I equations in (1) define the moment-matching 
problem. What wc want to do is to estimate the parameters a so that the 
conditions in (1) are satisfied. The question is whether such a problem has a 
solution, and if it does whether it is unique. To answer this we resort to the 
theory of exponential densities and convex analysis. 

First we should note that the moments of the potential functions define an 
alternative parametrization of the exponential family. This is known as mean 
parametrization. In fact, let e be the vector of moments of the potential 
functions. Also, define the set M as 

M = {// e K'|3p(-) : V(x)p(x)(ix = /x}. 

Note that M is a convex set and that in the definition of M we do not restrict 
the density p(-) to the exponential family. The density p(-) is any density that 
realizes /i. Typically, the exponential family {p(x, q;)|q; e A} is only a strict 
subset of all possible densities. Since we are interested in exponential densities, 
we have to find the relation between the set A of admissible parameter vectors 
and the set M. This will allow us to answer the question whether the moment- 
matching problem admits an exponential density as a solution. 
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For a given vector of potentials ijj, define the mapping A : A — > M as 



Whether there exists a parameter vector a satisfying (1) depends on the prop- 
erties of the mapping A. In particular, it depends on: i) if A is one-to-one and 
hence invertible on its image and ii) what is the image of A under A. There 
are two theorems that characterize the properties of A (see [6]). For the first 
theorem we need the distinction between a minimal and an overcomplete rep- 
resentation. 

Theorem 1 The mapping A is one-to-one if and only if the exponential rep- 
resentation is minimal. 

For the second theorem we need the definition of the relative interior of a set. 

The relative interior of a set is the interior taken with respect to its affinc hull. 
A key result from convex analysis states that for any non-empty convex set 
its relative interior is non-empty. 

Theorem 2 The mapping A is onto the relative interior of M. 

Theorem 2 in conjunction with Theorem 1 guarantees, for minimal exponential 
representations, the existence of a unique parameter vector for each point in 
the relative interior of M. Of course, there is the question of what happens 
for points in the closure of M that are not in the relative interior. To answer 
that we use one more result from convex analysis. 

Theorem 3 Let M be a convex set in MJ. Let x a point in the relative interior 
of M and y a point in the closure of M. Then Xx -\- {1 — X)y belongs to the 
relative interior of M for < A < 1. 

As we have mentioned before, the exponential family typically describes only 
a strict subset of all possible densities that give rise to the set M. However, 
Theorems 1-3 tell us that this is enough for the moment-matching problem. 
In the case of an overcomplete representation, there is no longer a one-to- 
one correspondence between A and A(A). But this is not a problem. For an 
overcomplete representation, the solution for the moment-matching problem 
is no longer unique but it exists. Any of the solutions are equally admissible, 
since all of them reproduce the same moments. 

Now that we have defined the moment-matching problem we have to find 
a way to actually estimate the parameter vector a. The equations (1) con- 
tain, in general, nonlinear functions of the parameters. Moreover, except for 
very special cases, these nonlinear functions are unknown or very difficult 
to manipulate analytically. Thus, we have to tackle the problem of estimat- 
ing the parameter vector numerically. We can define the Z-dimensional vector 
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f(Q;) = {fi{a), h{ot), fi{a)) as 



1 ^ 

fk{a)^E^[M^)]--^Mxj), k^l,...,l (2) 

i=i 

The moment-matching problem amounts to solving the system of (nonlinear) 
equations /^(a) = 0,k = Two popular candidates to perform such 

a task are the method of steepest descent and Newton's method. However, 
both have their drawbacks. The method of steepest descent converges but 
can have very slow convergence, while Newton's method converges quadrati- 
cally but it diverges if the initial guess of the solution is not good. We choose 
to solve the moment-matching problem as an optimization problem using the 
Levenberg-Marquardt (LM) algorithm (see e.g. [9]). This is a powerful iterative 
optimization algorithm that combines the advantages of the method of steep- 
est descent and Newton's method. First, let us write the moment-matching 
problem as an optimization problem. Define the error function e{a) as 

^ k=i ^ k=i 

where = fkicx)- The problem of minimizing e{a) is equivalent to solving the 

system of equations fk{a) = 0, k = 1, . . . ,1 i.e. the zeros of e are solutions of 
the system /^(a) = 0,k = 1, . . . ,1 and vice versa. The LM algorithm uses a 
positive parameter A to control convergence and the updates of the parameters 
at step 171 + 1 are calculated through the formula 

a^+i = - [J^ J + Xdiag{j'^J)]-^j'^i{a"'), (3) 

where J = ^|q=q™, ^, j = 1, . . . , Hs the Jacobian of f (a™) and its trans- 
pose. The matrix diag{J^J) is a diagonal matrix whose diagonal elements 
are the diagonal elements of (J^J)- In the literature, the name Levenberg- 
Marquardt is also used to denote the algorithm in (3) with diag{J^J) replaced 
by the unit matrix /. For the case where we use the unit matrix / instead of 
diag{J^ J), it is straightforward to see the connection with the methods of 
steepest descent and Newton's. For A = the algorithm reduces to Newton's 
method, while for very large A we recover the steepest descent method. The 
modification (due to Marquardt) of using diag{J^J) becomes important in 
the case where A is large. In this case if we only used the unit matrix / almost 
all information coming from {J^J) is lost. On the other hand, since (J^J) 
provides information about the curvature of e, use of the matrix diag{J^J) 
allows us to incorporate information about the curvature even in cases with 
large A. In the numerical simulations we used both forms of the algorithm. 
The form in (3) gave superior results. 

We have to prescribe a way of computing the Jacobian J(q;™). The element 



7 



Jij of the Jacobian is given by 




for i,j = l,...,l (note that the Jacobian is symmetric) So, all the quantities 
involved in equation (3) can be expressed as expectation values with respect to 
the m-th step parameter estimate a™. We compute these expectation values 
using the Metropohs Monte Carlo algorithm. This can make the algorithm 
expensive since the density has to be sampled at each step. However, the cost 
of the algorithm can be reduced by parallelization of the Monte Carlo sampling 
procedure. Also, note that if one uses as potential functions (non-orthogonal) 
polynomials, the condition number of the Jacobian matrix grows fast with 
the order of polynomials. An ill-conditioned Jacobian can lead to catastrophic 
errors in the evaluation of the parameter vector. This is especially crucial 
in the renormalization version of the algorithm, where one wants to use the 
parameter vector to look for possible phase transitions and their associated 
critical properties. This point will become clearer in Section 3 where we present 
results for the 2D Ising model of spins. 

We conclude this section with some comments about the value of A and the 
convergence criterion Tiscd to stop the iterative process. Since A acts as a 
regulator between the steepest descent and Newton aspects of the algorithm, 
its value should be determined in a way that brings out the advantages of the 
two methods. The starting value of A was chosen to be 1. When we detected a 
streak of a few error-decreasing steps, the value of A was decreased by a factor 
of 10. On the other hand, if the error increased, we repeated the step with the 
value of A increased by a factor of 10. We also need to prescribe a convergence 
criterion. We used the relative error criterion, i.e. the algorithm is stopped if 
I i^new — ^oid) / ^oid \ < RTOL. The valuc of RTOL is determined by the accuracy 
of the Monte Carlo sampling. The relative error criterion is not enough on its 
own because there is the possibility of a very small step while the algorithm 
is still far away from the minimum of the error. Such a small step would 
pass the relative error criterion and the algorithm would stop. To avoid such 
false convergence, we added an extra convergence check criterion. Whenever 
the relative error criterion was satisfied, we checked that the error value was 
acceptable under the absolute value criterion max J(\ek\/2)/\Lik\ < ATOL 



into account the magnitude of the moment. Also, ATOL cannot be smaller 
than the accuracy afforded by the Monte Carlo sampling. If the error did not 
pass the absolute error criterion, the algorithm was not stopped even if the 
relative criterion was satisfied. Finally, we added a constraint on the total 
number of iterations allowed and on the maximum value of A. 
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2.2 Renormalization (marginalization) of a known exponential density 



We present the necessary modifications to the above scheme in the case when 
we know the parameters of an exponential density and we want to compute 
marginal densities. Suppose that we know the parameter vector a for a given 
vector '?/'(x) of potential functions for an n-dimensional random vector x. 
The exponential density associated with these parameters and potentials is 
^^^^~i"a)'^^^^^ • Suppose that we want to retain only the first m variables. The 
vector X can be written as x — (x, x) , where x is the vector of the first m 
variables and x is the vector of the remaining n — m variables. Denote the ex- 
ponent as i?(x) = {a, ipi^x)) (this is just a notation, it does not imply that we 
are only considering Hamiltonian systems) . The problem of finding a function 
H{x) such that 



exp(-^(x)) = / exp(-//(x))(ix 



is well-defined, at least for a vector x of finite dimensionality. However, as we 
have already mentioned in the introduction, we not only want to compute ex- 
ponential marginal densities, but do so while retaining the mathematical form 
of the exponent. This means that we want the marginal density's exponent 
if(x) to have the form H{x) = (a,-?/'(x)). The functions ^|Jk{'^), k = 1,. . . ,1 
have the same form as the functions ipki^), but arc defined only over the m 
variables. We can think of the representation of the exponent through the 
potential functions as an expansion of the functions H and H in functions of 
x and X respectively. The functions H and H are expanded using the same 
form of potential functions (on different sets of variables). It is clear that the 
way we have chosen to represent the marginal density can result in this den- 
sity being only approximate. The reason is that the marginal density may 
involve potential functions in addition, or different, from the potential func- 
tions that we have chosen. How much of a problem this will turn out to be, 
depends on our choice of potential functions. If one has no physical intuition 
or prior knowledge about the form of the potential functions, one can use as 
potential functions polynomials in the reduced vector x. For example, if the 
random vector is continuous, one could use products of one-dimensional Leg- 
endre polynomials. Usually, one augments the vector of potential functions of 
the original density by adding components with the corresponding parameters 
set to zero. The additional parameters may or may not acquire nonzero values 
for the marginal density, depending on how well we choose them. 

Thus, the problem we are addressing is whether we can find a parameter vector 
a such that 

exp(-(Q;,'0(x))) = / exp(-(Q;,'0(x)))dx. (4) 

J'Sn—m 
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If (4) holds, then the normahzation constant Z satisfies Z{a) — Z{a), since 



Z{a) — / exp(— (a, '0(x)))(ix = / / exp(— (a, ■0(x)))dx dx 

= j exp(— (a, V'(x)))(ix = Z(q;) 

Equation (4) can be used again to compute the marginal density for a subset 
X of X and so on. Also it can be applied to more general groupings of variables 
in X, e.g. the renormalization scheme that we will use in Section 3 (see also 
[7]). 

Now we turn to the problem of estimating the parameter vector a. Since the 
marginal density is also of exponential form, we can apply the algorithm of 
Section 2.1. The resulting moment- matching problem is 

1 ^ . 

Ea[^k{y^)]^ —Y^il^ki^j), k^l,...,l (5) 

where 

/gm exp(— (d, ip{x.)))dx. 

is the expectation value of with respect to the density cxp(— (a, ip(x))) /Z(a). 
If we look at the equations (5), we see that we need samples of the random 
vector X. This can be effected through Swendsen's observation [8], that the 
marginal density can be sampled without knowing its explicit form. Since we 
know the density for the vector x, we can sample it using e.g. Metropolis 
Monte Carlo and obtain samples of the vector x by keeping only the first m 
variables of each vector x. After we obtain the samples of the vector x we 
can apply the rest of the algorithm as it is and estimate the parameter vector 
a. This procedure can be performed recursively (as is done e.g. in real-space 
renormalization [7]), and thus obtain a "parameter" fiow which contains use- 
ful information about the system. This is done in the next section for the 
two-dimensional Ising model of spins. 

Note that if we are interested only in producing samples of the reduced vector 
X, Swendsen's method suffices. However, to use Swendsen's method we have 
to compute first samples from the n-dimensional vector x and this can be very 
costly. What we offer here is a way of representing the marginal density for 
the reduced vector x with an analytical formula. All that we need to describe 
the marginal density is the parameter vector a. This is highly desirable if e.g. 
one wants to sample the marginal density in the future or compute conditional 
expectations with respect to the marginal density. Suppose that we want to 
compute the conditional expectation of a function /i(x) with respect to a 
subset of the (already) reduced vector x. Let the reduced vector be spht up 
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as X = (x''^,x^) where x-*^ is mi-dimensional and x^ is of dimension m2 = 
m — mi. Suppose that we want to compute the conditional expectation of 
/i(x) conditioned on x^. Since wc know the analytical expression for p(x, a), 
this amounts to the calculation (e.g. using Monte Carlo) of the quantity 



where the second equality follows from (4). If we did not have an analytical 
formula we would have to sample the original n-dimensional vector fixing the 
values of x^, i.e. use the second equality in (6). This can be very costly when 
n is large (as it happens usually in applications). On the other hand, the com- 
putation in the first equality in (6) is much cheaper. This can be useful e.g. 
in the inference problem for a graph with cycles (in the context of graphi- 
cal models), where one can compute the marginal densities on the different 
cliques (fully-connected clusters of nodes) and store the parameter vectors for 
further calculations within the individual cliques. In fact, the process can be 
parallelized, by assigning a processor to one (or a few) cliques. It would be in- 
teresting to see how this algorithm compares with the junction tree algorithm 
(see [13]) whose complexity grows exponentially with the size of the maximal 
clique. The present algorithm can be applied directly on the clique tree (an 
acyclic graph whose nodes are formed by cliques of the original graph) with- 
out the need for triangulation. It is also interesting to see how the present 
algorithm compares with variational inference methods [6]. 

From equations (1) and (5), we see that the problem of maximum likelihood 
estimation of the parameter vector for both the full system and the reduced 
system is transformed into the same moment-matching problem, i.e. involving 
the same number and same form of potential functions. The only difference is 
in the dimensionality of the random vector involved. 



3 Numerical results for the 2D Ising spin model 

In this section we apply the algorithm from Section 2 to the 2D Ising spin 
model. Since the system has a known exponential probability density, we can 
check the accuracy of the algorithm in computing this density starting from a 
different initial guess. Also, the Ising model exhibits a phase transition with 
known properties. We use the "renormalized" version of the algorithm (Section 
2.2) to compute the critical temperature. 

Consider a square lattice of size L. We denote a lattice site as Ik — {ik:jk) 





I, 



.+m2 exp(— (q!, ■0(x)))(ix(ix^ 
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where ik,jk arc integers and k — 1, . . . (we assume that the sites are hsted 
in a convenient order). We associate with each lattice site a (spin) variable 
Xk that can only take the values ±1. The variables on the whole lattice will 
be denoted by x = (xi, . . . ,Xl2). The lattice is periodic with period L. The 
Ising model of interaction of the variables (see e.g. [7]) is defined through the 
Hamiltonian 

^ <i,j> 

where <> means summation only over nearest neighbors. The parameter T 
is interpreted as the temperature in the context of statistical mechanics. The 
model's probability density function is 

where Z is the normalization constant (partition function). The Ising model 
has been used extensively to study phase transitions (as a function of the tem- 
perature) in magnetic materials. According to Onsager's analytical solution, 
the average magnetization m — E[Y,jLi xj/L"^] exhibits, in the limit of L — > oo, 
a transition from zero to non-zero values for temperatures T <Tc — 2.269. 

We can cast the probability density of the Ising model in the form of an 
exponential density as the ones described in Section 2. To do that we will 
need to define groups of variables around a site, say / = (i,i). These groups 
have nothing to do with the groupings of variables that we will use later for 
real-space renormalization. The groups are defined as follows: group 1 contains 
only xi. Group 2 contains the variables whose distance from 7 is 1 (the nearest 
neighbors), group 3 contains those variables whose distance is \/2, group 4 
contains the variables whose distance is 2 etc. We use the members of the 
groups to define the corresponding collective variables X/ ^ as 

Xi,k ^ — Y xj, 

group k 

where Uk is the number of variables in group k. These collective variables 
can be used to form translation-invariant polynomials in x, for example, 
X]/=i -Y/,i(^7,fe)^. Using this notation, the Hamiltonian for the Ising model 

can be written as H = —j; J^fLi ^ 1,1^1, 2 = |^(~ S/=i ^ 1.1^1.2^ For the case 
of the Ising model, we will include a minus sign in the definition of all the 
potential functions. The thermodynamic properties are the same whether or 
not we include the minus sign [7]. In the numerical simulations we used the 
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following 8 potential functions 



for A; = 1,2,3,4,5 



J 



J2{Xj,k+i)^ for A; = 1,2 



(7) 



J 



^8 



J 



where the summation extends over the whole lattice (sec also [2]). The Hamil- 
tonian for the Ising model can be written as if(Q;,x) = J2k=i'^ki^k{'^), with 
ai — 2/T and = for Z = 2, . . . , 8. The coefficients for the potential func- 
tions tp2 ■ ■ ■'ips acquire nonzero values when we compute marginal densities for 
subsets of X. 

The algorithm we described in Section 2 can be used to estimate the parame- 
ters of an unknown exponential density or to renormalize a known density. As 
a test of the first aspect of the algorithm we compute the parameters for the 
Ising model. In other words, we prescribe some initial values for the parame- 
ters ak, k = 1, . . . , 8 and we apply the algorithm. The parameters computed 
by the algorithm should converge to the true values cki = 2/T and ai — 
for Z = 2, . . . , 8. We should note here, that the representation we are using is 
not minimal and thus, there are more than one admissible sets of parameters 
for the same moments. In the case of the Ising model, if we want to recover 
the values ai = 2/T and ai = for / = 2, . . . , 8 we should start close to 
these values to ensure that we are in their basin of attraction. Of course, any 
set of parameters that is produced from a convergent run of the algorithm is 
equally acceptable. In Fig.(l) we compare the value of the parameter ai as de- 
termined by the algorithm with its true value 2/T for different temperatures. 
The initial guess was 1.9/T for ai and zero for the rest of the parameters. 
The results shown are for 10^ Monte Carlo steps per spin (each step involves 
a sweeping of the whole lattice). The agreement for cci shown in Fig.(l) is 
within the accuracy of the Monte Carlo computation. The values of the rest 
of the parameters 0:2, . . . , ag, remain zero to within the accuracy afforded by 
the Monte Carlo sampling. The relative and absolute error tolerances were set 
to 0.001 and the calculations were performed using a 40 by 40 lattice. The 
algorithm converged after 5-7 iterations depending on the temperature. The 
condition number for the Jacobian matrix was about 10^ for all temperatures. 

We now turn to the main focus of the present paper, which is the computation 
of marginal densities for known exponential densities. Suppose that we begin 
with an Ising spin square lattice of size L. We split the vector of variables x 
as X = (x, x). The vector x contains the variables whose marginal density we 
want to compute and x is the vector of variables we wish to eliminate. We use 
the same 8 potential functions as above. The exponential density for the vector 
X is given by exp(— if(Q;, x)), where the Hamiltonian H{a, x) = X]fc=i ctfc^fc(x). 



13 



0.900 



Q. 

<a 

c 
c> 

|o 

o 
O 



0.800 



> ML algorithm 
I Ising model 



0.700 I ' ' ' ' 1 

2.240 2.250 2.260 2.270 2.280 2.290 

Temperature 

Fig. 1. Comparison of the value of the parameter ai as determined by the algorithm 
with its true value. 

with tti = 2/T and ai = for / = 2,..., 8. We can apply the algorithm 
of Section 2.2 and obtain the density exp{—H{a,yi))/Z{a) for the vector x. 
The Hamiltonian is H{a,'k) — Y%=iOik'4^k{'^), where the parameter vector a 
contains the new values that were computed from the algorithm. Of course, 
once the density for x is known, we can repeat the procedure and find the 
density for a subset of the vector x and so on. We can make this recursive 
elimination of variables more systematic. Suppose that we start with a lattice 
of size L X L with L even. We denote the vector of spins corresponding to this 
lattice as x*^°\ We construct 2x2 blocks of variables and we represent each 
block by one variable, say the lower left-hand corner variable of the block. 
The vector of the new variables is denoted by x*^^^ and occupies a lattice 
of size L/2 x L/2. There are different ways of assigning values to the new 
variables. Here we pick the "majority" rule. This means that if the sum of 
spins in a block is positive, the new variable is assigned the value of 1, if the 
sum of spins is negative the value -1 . In the case of a tie, the value of the 
the new variable is taken to be the value of the spin on the lower left-hand 
corner. Then we apply the algorithm and we obtain the density for the new 
variables. As we have already mentioned, we are not sure beforehand, that the 
potential functions we choose are enough to describe all the couplings that may 
appear among the new variables. Thus the marginal density that we compute 
is usually only an approximation to the marginal density of the new variables. 
Once the density for the vector x^^^ is computed, we can repeat the coarse- 
graining process and obtain the density for the vector x^^^ which occupies 
a lattice of size L/4 x L/4 and so on. The corresponding parameter vectors 
a^'^\ a^^\ a^'^\ . . . define a "fiow" in the space of parameters. As is known [7], 
the flow of parameter vectors contains useful information for a system near its 
critical point. In addition, we can store the parameter vectors for the different 
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Fig. 2. Second moment for successive renormalization steps for different tempera- 
tures. 

marginalization steps for future use. 

We use the parameter vectors for the successive renormahzation steps to locate 
the critical temperature of the Ising model. For temperatures T < Tc the 
renormalization steps couple ever more distant spins while for T > Tc the 
spins become successively decoupled. We can measure this successive coupling 
or decoupling by focusing on the quadratic terms (the terms — Y.j Xj^iXj^kJ^i) 
in the expansion of the Hamiltonian and calculating the "second moments" of 
their associated parameters 

k=i 

The superscript j is used to denote the parameters at the j-th renormalization 
step and Mg^'' = is the second moment at the fine level (j = 0). The pa- 
rameters dki k — 1, . . . ,lq denote the distance from J of the spins in the group 
k, while Iq is the number of quadratic terms in the expansion of the Hamilto- 
nian. For our experiments Iq = 5 (see the definition of potential functions in 
(7)). In Fig. (2) wc show the evolution of for j = 0, 1, .... 4 and different 
temperatures. The fine level lattice is 80 by 80 and each renormalization step 
(based on the majority rule) decreases the number of spins by a factor of 4. For 
all the renormalization steps, the parameters were initialized at their values at 
the fine level. This means, that for ? = 1, . . . , 4 we started the algorithm with 
orl =2/T and =0, A; = 2, .... 8. As expected from the analytical results, 
below the critical temperature the second moment increases with successive 
renormalization steps, while for temperatures above the critical one, the sec- 
ond moment decreases with each renormalization step. The temperature for 
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Fig. 3. Second moment for successive renormalization steps for temperatures close 
to the critical. 

which the second moment remains constant (within the accuracy afforded by 
Monte Carlo samphng) should be the critical temperature. For our choice of 
potential functions and 10^ Monte Carlo steps per spin, the critical temper- 
ature is found to be ~ 2.275, an error of about .3%. This can be seen in 
Fig. (3) where we have focused on a tighter temperature interval around the 
critical temperature. For T — 2.275, the second moment remains constant to 
within the accuracy of the Monte Carlo samphng. The relative and absolute 
tolerances were set to 0.001. The Levenberg-Marquardt algorithm converged 
after 5-12 iterations depending on the temperature and the renormalization 
step. As before, the condition number for the Jacobian matrix was about 10^ 
for all temperatures and renormalization steps. 



Conclusions 

We have presented an algorithm for the estimation and renormalization of 
exponential densities. The algorithm is based on the maximization of the like- 
lihood and results in a moment-matching problem. The matching problem is 
solved as an optimization problem with the Levenberg-Marquardt algorithm. 
For the case of renormalization, the moments for the reduced system are com- 
puted using Swendsen's renormalization method. We have exhibited the use 
of the algorithm by applying it to the Ising model, where it reproduces the 
analytical results with good accuracy. 

We hope that the algorithm can be useful in diverse areas where one needs to 
estimate an unknown density or marginalize a known density. Such areas range 
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from applications to graphical models (the inference problem) to reductions 
of systems of differential equations. In the latter case, we are planning to 
apply the algorithm to estimate the density of solutions produced by finite- 
difference or spectral formulations of partial differential equations that do not 
have a natural candidate for a density. An analytical expression for the density 
in such situations can be helpful in the process of constructing reduced models 
[3]. 
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